tests: fix parametrize patterns rejected by pytest 9.1.0 by leofang · Pull Request #2212 · NVIDIA/cuda-python

leofang · 2026-06-14T07:11:29Z

Summary

main has been red since pytest 9.1.0 landed on PyPI — every Test linux-* / Test win-* matrix entry fails at pytest collection time, before any actual test runs. Two unrelated latent bugs in our test code, both tolerated by older pytest but rejected by pytest 9.1.0's stricter parametrize validation:

Bug 1: trailing comma in `parametrize` name (cuda_core)

cuda_core/tests/test_utils.py:151 had:

@pytest.mark.parametrize("in_arr,", _cpu_array_samples())

The , inside the string was a stray. pytest 9 splits names on comma, ends up with one name but 3-tuple values, and fails collection with:

in "parametrize" the number of names (1):
  ['in_arr']
must be equal to the number of values (3):
  (665115599, 23133, 0)

Fix: drop the trailing comma.

Bug 2: `indirect=True` override of a fixture-level parametrize (cuda_bindings)

cuda_bindings/tests/test_nvfatbin.py has an arch fixture parametrized with params=ARCHITECTURES. Two tests overrode it via @pytest.mark.parametrize("arch", ["sm_80"], indirect=True). pytest 9 now rejects this as:

duplicate parametrization of 'arch'

Fix: extract the CUBIN-building logic from the CUBIN fixture into a _build_cubin(arch) helper, drop the indirect override on the two affected tests, and call the helper directly with "sm_80" (preserving the original intent — those tests intentionally used only sm_80, since target arch "75" must not match the CUBIN's arch).

Backwards compatibility

Both fixes are pytest-version-agnostic — pip pin (pytest>=6.2.4) doesn't need to change. Verified by collecting against three pytest versions (minimal repros, included below for reproducibility):

pytest	broken pattern 1	fixed 1	broken pattern 2	fixed 2
9.1.0	collection error	clean	collection error	clean
9.0.2	clean (tolerant)	clean	clean (tolerant)	clean
8.4.2	clean (tolerant)	clean	clean (tolerant)	clean

Reference

Affected CI runs on main:

Same pattern on my open #2210: https://github.com/NVIDIA/cuda-python/actions/runs/27489049015 — 38 Run cuda.core tests failures + 23 Run cuda.bindings tests failures all stem from these two collection errors.

Two latent test-code bugs that older pytest tolerated but pytest 9.1.0 flags as collection errors, breaking every Test job on main since the pytest 9.1.0 release: * cuda_core/tests/test_utils.py:151 had a stray trailing comma in the `parametrize` name string (`"in_arr,"`). pytest 9 now splits names on comma and counts them, mismatching against the multi-element value tuples. Drop the comma. * cuda_bindings/tests/test_nvfatbin.py had two tests using `@pytest.mark.parametrize("arch", ["sm_80"], indirect=True)` to override the fixture-level `arch` parametrization. pytest 9 now rejects this combination as "duplicate parametrization of 'arch'". Extract the CUBIN-building logic into a `_build_cubin(arch)` helper, drop the indirect override on the two tests, and call the helper inline with the hardcoded `"sm_80"` they need. Preserves intent (the override existed because target arch "75" must not match the CUBIN's arch). Both fixes are pytest-version-agnostic; verified collecting cleanly under pytest 9.1.0, 9.0.2, and 8.4.2 with minimal reproductions of each pattern.

copy-pr-bot · 2026-06-14T07:11:34Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

rwgk

I saw ... it can only get better!

rwgk

GTP-5.5:

No code findings from my review.

The two edits look correct and narrowly scoped:

cuda_core/tests/test_utils.py: fixes the stray @pytest.mark.parametrize("in_arr,", ...) name. _cpu_array_samples() supplies one argument per case, so in_arr is the intended single parameter name.
cuda_bindings/tests/test_nvfatbin.py: extracts the old CUBIN fixture body into _build_cubin(arch), keeps the fixture behavior unchanged, and lets the two mismatch tests build only sm_80 without re-parametrizing the existing arch fixture.

Operationally, I would not call the PR merge-ready until full CI runs. Right now the visible checks only include path-label/restricted-path/metadata checks plus pre-commit.ci, and the copy-pr-bot comment says the PR still needs validation before NVIDIA runner workflows can run. Code-wise this looks ready to test; process-wise it still needs the full CI trigger and a green run before merging.

rwgk · 2026-06-14T17:57:46Z

/ok to test fadd5bd

github-actions · 2026-06-15T00:45:35Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

Backport of #2212, scoped down to the cuda_bindings/tests/test_nvfatbin.py portion that applies to 12.9.x. The cuda_core/tests/test_utils.py portion of #2212 (the trailing-comma-in-parametrize-name fix) does not apply here because the 12.9.x version of that test file does not have the bug — its parametrize uses two names matching tuple values. What is fixed (verbatim from #2212): cuda_bindings/tests/test_nvfatbin.py had two tests using @pytest.mark.parametrize("arch", ["sm_80"], indirect=True) to override the fixture-level `arch` parametrization. pytest 9.1.0 now rejects this combination as "duplicate parametrization of 'arch'". Extract the CUBIN-building logic into a _build_cubin(arch) helper, drop the indirect override on the two tests, and call the helper inline with the hardcoded "sm_80" they need. Preserves intent (the override existed because target arch "75" must not match the CUBIN's arch). Closes #2226. Hunk body verified identical to the corresponding hunk in #2212 (commit a9156b6).

Two nightly failure fixups after the first green iteration: nightly-numba-cuda-mlir: numba-cuda-mlir 0.4.0 has an inverted guard that registers an overload of np.row_stack on NumPy 2.x, and NumPy 2.5 removed that name entirely, so test collection fails with "AttributeError: module 'numpy' has no attribute 'row_stack'". Cap numpy to <2.5. See NVIDIA/numba-cuda-mlir#154. nightly-cuda-core: released cuda-core v1.0.1's test suite uses a parametrize argvalues pattern that pytest 9.1 rejects ("in parametrize the number of names (1)... must be equal to the number of values (3)"). The main-side fix was NVIDIA#2212 but it has not shipped in a cuda-core release yet. Cap pytest to <9.1 for the released-cuda-core test run only.

* CI: add nightly-cuda-core and nightly-numba-cuda-mlir modes nightly-cuda-core: test the released cuda-core from PyPI against main-built pathfinder and cuda-bindings, catching the "core released × bindings main" gap documented in issue #1955. Runs on linux-64 (a100) and win-64 (a100 MCDM). nightly-numba-cuda-mlir: MLIR-backend companion to nightly-numba-cuda. Installs main pathfinder+bindings+core plus numba-cuda-mlir from PyPI, runs numba-cuda-mlir's own test suite from the matching git tag. Linux amd64/arm64 x CUDA 12.9.1 / 13.3.0. Both modes fetch the released version's tests from git tags because the respective wheels do not ship test_*.py files. Includes tag-not-found fallback (log warning + exit 0) to avoid red-lining the nightly on a freshly-cut PyPI release that hasn't been pushed to git yet. * ci/test-matrix.yml: fix CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM typo The two ENV overrides intended to exercise the per-thread default stream code path were misspelled (missing the CUDA_ segment), so the env var was silently ignored and the PTDS coverage added in #1972 had no effect. Rename to the correct CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM. Refs #971. * cuda_pathfinder: pin nvshmem to <3.7 (was previously excluding only 3.7.0) nvidia-nvshmem-cu{12,13} 3.7.x breaks the main branch, not only 3.7.0. Widen the exclusion from an exact-version bump to <3.7 so 3.7.x and above are avoided until we can move forward. * nightly-numba-cuda-mlir: swap arm64 for win-64 coverage, use rtxpro6000 Drop the linux-aarch64 rows and instead add win-64 coverage with the same CUDA 12.9.1 / 13.3.0 pair. Switch all four rows from GPU l4 to rtxpro6000. Windows rows use DRIVER_MODE MCDM, matching the existing rtxpro6000 CUDA 13.3.0 patterns. * Temporarily add push trigger to ci-nightly.yml for testing Remove before merging. * CI: switch nightly-{cuda-core,numba-cuda-mlir} to actions/checkout for tests The initial approach used git inside the ubuntu:24.04 container to fetch the released version's test suite, but git is not installed on that container (install_unix_deps only pulls in jq/wget/g++/etc.) and its absence made the run steps silently skip via the tag-not-fetchable fallback. On Windows, git archive of just the cuda_core subtree also hit a dangling-symlink extraction failure (cuda_core/.git_archival.txt). Refactor to: - run-tests: just install wheels and expose the resolved release version (CUDA_CORE_RELEASED_VER / NUMBA_CUDA_MLIR_VER) and cuda-core test-group name via GITHUB_ENV. No more git operations. - test-wheel-{linux,windows}.yml: add an actions/checkout step per mode that pulls the matching release tag into a subdirectory (cuda-core-released / numba-cuda-mlir-released), then the follow-up test step installs that tag's test dep-group and runs pytest. For numba-cuda-mlir also pass --ignore=tests/benchmarks --ignore=tests/doc_examples to pytest: those directories import the `numba` package at module top and would fail collection, which is cuSIMT's expected behavior (see NVIDIA/numba-cuda-mlir#136 — cuSIMT intentionally does not depend on numba). * CI: pin numpy<2.5 (mlir) and pytest<9.1 (cuda-core released tests) Two nightly failure fixups after the first green iteration: nightly-numba-cuda-mlir: numba-cuda-mlir 0.4.0 has an inverted guard that registers an overload of np.row_stack on NumPy 2.x, and NumPy 2.5 removed that name entirely, so test collection fails with "AttributeError: module 'numpy' has no attribute 'row_stack'". Cap numpy to <2.5. See NVIDIA/numba-cuda-mlir#154. nightly-cuda-core: released cuda-core v1.0.1's test suite uses a parametrize argvalues pattern that pytest 9.1 rejects ("in parametrize the number of names (1)... must be equal to the number of values (3)"). The main-side fix was #2212 but it has not shipped in a cuda-core release yet. Cap pytest to <9.1 for the released-cuda-core test run only. * CI: deselect known pre-existing failures in nightly-cuda-core and nightly-numba-cuda-mlir Applied only in the affected nightly-* pytest invocations; the released source trees under test are unmodified. nightly-numba-cuda-mlir (all 10 tests deselected are from cuSIMT): * CudaArraySetting::{test_no_sync_default_stream, test_no_sync_supplied_stream, test_sync} TestCudaArrayInterface::{test_consume_no_sync, test_consume_sync, test_launch_no_sync, test_launch_sync, test_launch_sync_two_streams, test_fortran_contiguous} Serial-pytest contamination of numba_cuda_mlir.cuda.cudadrv from an xfailed test in test_nrt_comprehensive.py. Upstream CI runs with `pytest -n auto --dist loadscope`, which isolates the offending side effect in a separate xdist worker; our nightly runs serially and hits the pollution. See NVIDIA/numba-cuda-mlir#135. * TestLinkerDumpAssembly::test_nvjitlink_jit_with_linkable_code_lto_dump_assembly_warn Subprocess-invokes `cuobjdump`, which isn't on PATH in the base ubuntu:24.04 container. Filed as an upstream skip-guard bug. nightly-cuda-core (3 tests deselected are pre-existing v1.0.1 issues): * test_enum_coverage.py::test_wrapper_covers_all_binding_members[NvlinkVersion] Expected drift: main cuda-bindings adds NvlinkVersion.VERSION_6_0 which v1.0.1's wrapper mapping predates. This mode intentionally pairs released core with main bindings, so this coverage-style test will stay red here until a cuda-core release catches up. * test_rlcompleter_patch.py::test_opt_out_env_var_disables_patch_even_when_interactive Environment-dependent test: expects rlcompleter to crash without the tab-completion patch, but on Windows MCDM the pre-patch behavior is clean. Passes on Linux, fails on Windows MCDM. * test_memory.py::test_non_managed_resources_report_not_managed[pinned] Same underlying "Failed to allocate memory from pool" error that v1.0.1 already xfails in the sibling test_pinned_memory_resource_initialization (TODO(#9999)). cuda-python main has since fixed the parametrized case to route through _allocate_pinned_buffer_or_xfail(), but that fix hasn't shipped in a cuda-core release yet. * CI: tighten deselects to per-platform failing sets Previously applied the same list on both Linux and Windows workflows, which over-deselected — some tests only fail on one platform because the underlying issues (serial-pytest test-order in mlir, MCDM-only behavior in cuda-core) are platform-specific. Now: nightly-numba-cuda-mlir linux-64: TestCudaArrayInterface::{test_consume_no_sync, test_consume_sync, test_launch_no_sync, test_launch_sync, test_launch_sync_two_streams, test_fortran_contiguous} + TestLinkerDumpAssembly::test_nvjitlink_jit_with_linkable_code_lto_dump_assembly_warn. win-64: CudaArraySetting::{test_no_sync_default_stream, test_no_sync_supplied_stream, test_sync} + TestCudaArrayInterface::test_fortran_contiguous. Test-order contamination in numba-cuda-mlir#135 surfaces different tests depending on collection order (linux-64 vs win-64 exercise different subsets), so the per-platform lists differ. cuobjdump-based TestLinkerDumpAssembly only fires on Linux because the ubuntu:24.04 container's PATH lacks cuobjdump; Windows runners ship it with the local CTK. nightly-cuda-core linux-64: test_enum_coverage.py::test_wrapper_covers_all_binding_members[NvlinkVersion]. win-64: NvlinkVersion (same as Linux) + test_rlcompleter_patch.py::test_opt_out_env_var_disables_patch_even_when_interactive + test_memory.py::test_non_managed_resources_report_not_managed[pinned]. rlcompleter and pinned mempool tests only fail on Windows MCDM. NvlinkVersion fails on both (expected drift for the mode). * CI: version-gate the nightly-mode deselects so they auto-clean Each deselect is now wrapped in a bash conditional keyed on the installed release version. When a newer numba-cuda-mlir or cuda-core release ships with the referenced fix, the nightly picks it up automatically, the guard evaluates false, and the deselect drops — so the tests run against the new release. If they still fail we hear about it loudly rather than silently masking a regression. Current guards: - numba-cuda-mlir #135 tests + cuobjdump TestLinkerDumpAssembly: applied when installed numba-cuda-mlir version <= 0.4.0. - cuda-core NvlinkVersion / rlcompleter opt-out / pinned mempool: applied when installed cuda-core version <= 1.0.1. Structure keeps one conditional block per (mode, platform) with a comment above each deselect explaining the tracking issue. * CI: broaden mlir deselect list to full #135 union across platforms The previous per-platform-tight lists were incomplete: NVIDIA/numba-cuda-mlir#135's import-time contamination poisons whichever tests reference cuda.cudadrv.driver AFTER the polluting xfail runs, and collection order varies between runs. Two consecutive Windows CI runs failed on different subsets (3 slicing tests one run, 5 interface tests the next). Deselect the full union of #135-listed tests + test_fortran_contiguous (observed to hit the same contamination) on both Linux and Windows. Same version guard (<= 0.4.0) still applies, so the whole block drops automatically when a newer numba-cuda-mlir release ships with the root-cause fix. Linux keeps the extra cuobjdump deselect (Linux-only environment issue). * Revert "cuda_pathfinder: pin nvshmem to <3.7 (was previously excluding only 3.7.0)" This reverts commit 2a42aa7. * Revert "Temporarily add push trigger to ci-nightly.yml for testing" This reverts commit a0ccd19.

github-actions Bot added cuda.bindings Everything related to the cuda.bindings module cuda.core Everything related to the cuda.core module labels Jun 14, 2026

rwgk approved these changes Jun 14, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

rwgk marked this pull request as ready for review June 14, 2026 18:30

rwgk enabled auto-merge (squash) June 14, 2026 18:30

rwgk assigned leofang Jun 14, 2026

rwgk added this to the cuda.bindings next milestone Jun 14, 2026

rwgk added the P0 High priority - Must do! label Jun 14, 2026

rwgk merged commit a9156b6 into NVIDIA:main Jun 14, 2026
108 of 110 checks passed

This comment has been minimized.

Sign in to view

leofang deleted the leofang/fix-pytest9-collection-errors branch June 15, 2026 13:48

This was referenced Jun 15, 2026

BUG: pytest 9.1.0 breaks CI for the 12.9.x branch #2226

Closed

tests: fix duplicate parametrization rejected by pytest 9.1.0 (#2212 backport) #2227

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tests: fix parametrize patterns rejected by pytest 9.1.0#2212

tests: fix parametrize patterns rejected by pytest 9.1.0#2212
rwgk merged 1 commit into
NVIDIA:mainfrom
leofang:leofang/fix-pytest9-collection-errors

leofang commented Jun 14, 2026

Uh oh!

copy-pr-bot Bot commented Jun 14, 2026

Uh oh!

rwgk left a comment

Uh oh!

rwgk left a comment

Uh oh!

rwgk commented Jun 14, 2026

Uh oh!

This comment has been minimized.

Uh oh!

This comment has been minimized.

github-actions Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

leofang commented Jun 14, 2026

Summary

Bug 1: trailing comma in parametrize name (cuda_core)

Bug 2: indirect=True override of a fixture-level parametrize (cuda_bindings)

Backwards compatibility

Reference

Uh oh!

copy-pr-bot Bot commented Jun 14, 2026

Uh oh!

rwgk left a comment

Choose a reason for hiding this comment

Uh oh!

rwgk left a comment

Choose a reason for hiding this comment

Uh oh!

rwgk commented Jun 14, 2026

Uh oh!

This comment has been minimized.

Uh oh!

This comment has been minimized.

github-actions Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bug 1: trailing comma in `parametrize` name (cuda_core)

Bug 2: `indirect=True` override of a fixture-level parametrize (cuda_bindings)